253

DOI: 10.1201/9781003355205-7

C h a p t e r 7

Targeted Gene Metagenomic

Data Analysis

7.1.  INTRODUCTION TO METAGENOMICS

We can use any of the sequencing applications discussed in the previous chapter to study

an individual bacterium. Rather than a single species of bacteria, metagenomics involves

studying the genomes of a community of bacteria recovered from environmental or ­clinical

samples to obtain a variety of knowledge of the microbial species present in the sample and

their impacts on other living organisms. The most common samples targeted by metage-

nomics include human skin and cavities, digestive system, water, plants, soil, waste (liquid

and solid, feces), and food products. Metagenomics has a variety of uses including identi-

fication of an unknown pathogen in outbreaks of a disease, clinical diagnosis, monitoring

human and animal health, identification of bioactive compounds (terragines, violacein,

and indirubin) [1], drugs from marine microorganisms such as cytarabine (anti-cancer)

[2], cephalosporins (anti-microbial) [3], and vidarabine (anti-virus) [2], discovery of novel

antibiotics, production of some enzyme (lipases, proteases, lyases, amylases, etc.), explor-

ing new industrial and healthy products (indigo, probiotics) [4], and investigation and

monitoring of wildlife health.

In a typical metagenomics study, genomic DNA of the whole bacterial community in

a sample is extracted and then sequenced, and a pipeline of analyses is conducted on the

acquired sequence data. There are two sequencing approaches to characterize microbial

taxonomic groups in environmental samples. The first one targets a specific marker gene

or a region of a marker gene after being amplified with polymerase chain reaction (PCR).

The 16S rRNA gene is often used for this purpose. The second approach targets all bacte-

rial genomes in the samples and uses the shotgun whole genome sequencing to achieve

that. We will discuss the shotgun sequencing in the next chapter. In this chapter, we will

focus on the amplicon-based sequencing to identify bacteria in environmental or clinical

samples.